K-attractors: a Partitional Clustering Algorithm for Numeric Data Analysis

نویسندگان

  • Yiannis Kanellopoulos
  • Panagiotis Antonellis
  • Christos Tjortjis
  • Christos Makris
  • Nikos Tsirakis
چکیده

Clustering is a data analysis technique, particularly useful when there are many dimensions and little prior information about the data. Partitional clustering algorithms are efficient, but suffer from sensitivity to the initial partition and noise. We propose here k-Attractors, a partitional clustering algorithm tailored to numeric data analysis. As a pre-processing (initialization) step, it employs maximal frequent itemset discovery and partitioning to define the number of clusters k and the initial cluster “attractors”. During its main phase the algorithm utilizes a distance measure, which is adapted with high precision to the way initial attractors are determined. We applied k-Attractors as well as k-Means, EM and FarthestFirst clustering algorithms to several datasets and compared results. Comparison favored k-Attractors in terms of convergence speed and cluster formation quality in most cases, as it outperforms these 3 algorithms except from cases of datasets with very small cardinality containing only a few frequent itemsets. On the downside, its initialization phase adds an overhead that can be deemed acceptable only when it contributes significantly to the algorithm’s accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی خودکار داده‌های مختلط با استفاده از الگوریتم ژنتیک

In the real world clustering problems, it is often encountered to perform cluster analysis on data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. In addition, traditional methods, for example, the K-means algorithm, usually ask the user to provide the number of clusters. In this...

متن کامل

A cluster centers initialization method for clustering categorical data

Keywords: The k-modes algorithm Initialization method Initial cluster centers Density Distance a b s t r a c t The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, the performance of the k-modes clustering algorithm which converges to numerous local minima strongly depends on initial cluster centers...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

Automatic Clustering Approaches Based On Initial Seed Points

-Since clustering is applied in many fields, a number of clustering techniques and algorithms have been proposed and are available in the literature. This paper proposes a novel approach to address the major problems in any of the partitional clustering algorithms like choosing appropriate K-value and selection of K-initial seed points. The performance of any partitional clustering algorithms d...

متن کامل

A Detailed Study and Analysis of different Partitional Data Clustering Techniques

The concept of Data Clustering is considered to be very significant in various application areas like text mining, fraud detection, health care, image processing, bioinformatics etc. Due to its application in a variety of domains, various techniques are presented by many research domains in the literature. Data Clustering is one of the important tasks that make up Data Mining. Clustering can be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Applied Artificial Intelligence

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2011